Odds Ratio: Interpretation + Hypothesis Test (2×2 tables)#
The odds ratio (OR) is a measure of association between two binary variables (e.g., treatment vs control and event vs no event).
This notebook focuses on:
what odds and odds ratios mean (and what they do not mean)
how to test the null hypothesis OR = 1 (“no association”)
a low-level NumPy implementation (no stats libraries)
an exact alternative (Fisher’s exact test) for small samples
import math
import platform
import sys
import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
rng = np.random.default_rng(42)
print("Python:", sys.version.split()[0])
print("Platform:", platform.platform())
print("NumPy:", np.__version__)
print("Plotly:", plotly.__version__)
Python: 3.12.9
Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.35
NumPy: 1.26.2
Plotly: 6.5.2
1) Odds vs probability#
A probability \(p\) lives in \([0,1]\).
The corresponding odds are:
Interpretation:
odds = 1 means “event” and “no event” are equally likely (\(p=0.5\))
odds = 3 means the event is 3× as likely as the non-event
The odds ratio compares odds between two groups:
Important: an OR multiplies odds, not probabilities. The same OR can imply very different probability changes depending on the baseline probability.
# Visualize probability -> odds and probability -> log-odds (logit)
p = np.linspace(0.001, 0.999, 800)
odds = p / (1.0 - p)
log_odds = np.log(odds)
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=("Odds = p/(1-p)", "Log-odds = log(p/(1-p))"),
)
fig.add_trace(go.Scatter(x=p, y=odds, mode="lines", name="odds"), row=1, col=1)
fig.update_yaxes(type="log", title_text="odds (log scale)", row=1, col=1)
fig.update_xaxes(title_text="probability p", row=1, col=1)
fig.add_trace(go.Scatter(x=p, y=log_odds, mode="lines", name="log-odds"), row=1, col=2)
fig.update_yaxes(title_text="log-odds", row=1, col=2)
fig.update_xaxes(title_text="probability p", row=1, col=2)
# Helpful reference line: p=0.5 -> odds=1 -> log-odds=0
for col in [1, 2]:
fig.add_vline(x=0.5, line_dash="dash", line_color="gray", row=1, col=col)
fig.update_layout(title="Odds are an unbounded re-parameterization of probability", showlegend=False, height=420)
fig.show()
2) Odds ratio from a 2×2 table#
For two binary variables (e.g., Exposure and Outcome), summarize counts in a 2×2 table:
Outcome = 1 |
Outcome = 0 |
|
|---|---|---|
Exposure = 1 |
\(a\) |
\(b\) |
Exposure = 0 |
\(c\) |
\(d\) |
Group odds:
Odds ratio:
OR = 1: equal odds (no association)
OR > 1: higher odds of outcome in the exposed group
OR < 1: lower odds of outcome in the exposed group
If any of \(a,b,c,d\) are 0, the raw OR can be 0 or \(\infty\). A common small-sample fix is the Haldane–Anscombe correction: add 0.5 to every cell before computing OR and standard errors.
# Example 2×2 table
# Rows: [exposed, unexposed]
# Cols: [event, no event]
table = np.array([[30, 70],
[15, 85]], dtype=float)
a, b = table[0]
c, d = table[1]
p1 = a / (a + b)
p0 = c / (c + d)
odds1 = a / b
odds0 = c / d
OR = odds1 / odds0
print(f"P(event | exposed) = {p1:.3f}")
print(f"P(event | unexposed) = {p0:.3f}")
print(f"odds(exposed) = {odds1:.3f}")
print(f"odds(unexposed) = {odds0:.3f}")
print(f"odds ratio (OR) = {OR:.3f}")
fig = go.Figure(
data=go.Heatmap(
z=table,
x=["Event", "No event"],
y=["Exposed", "Unexposed"],
colorscale="Blues",
text=table.astype(int),
texttemplate="%{text}",
hovertemplate="%{y} / %{x}<br>count=%{z}<extra></extra>",
)
)
fig.update_layout(title="Observed 2×2 counts", height=360)
fig.show()
P(event | exposed) = 0.300
P(event | unexposed) = 0.150
odds(exposed) = 0.429
odds(unexposed) = 0.176
odds ratio (OR) = 2.429
3) What hypothesis is being tested?#
In a 2×2 table, the most common null hypothesis is:
This is equivalent to independence between exposure and outcome (no association).
A typical workflow:
summarize your data as a 2×2 table
compute an estimate of OR (effect size)
compute a p-value and/or confidence interval for OR
Two common ways to test \(H_0\):
Large-sample Wald test using \(\log(\widehat{ ext{OR}})\) (fast, approximate)
Fisher’s exact test (exact under fixed margins; better for small counts)
Large-sample Wald test on the log-odds-ratio#
We test on the log scale because OR is positive and skewed, while \(\log( ext{OR})\) is often close to normal.
Estimate:
Approximate standard error:
Test statistic (approximately standard normal under \(H_0\)):
Two-sided p-value:
A \((1-lpha)\) confidence interval for OR:
Notes:
This approximation is usually OK when all cells are reasonably large (rule of thumb: ~5+).
If any cell is very small (or 0), prefer Fisher’s exact test and/or use a continuity correction.
# --- Normal CDF/PPF approximations (NumPy-only) ---
SQRT_2PI = np.sqrt(2.0 * np.pi)
def normal_pdf(x: np.ndarray) -> np.ndarray:
x = np.asarray(x, dtype=float)
return np.exp(-0.5 * x * x) / SQRT_2PI
def normal_cdf(x: np.ndarray) -> np.ndarray:
# Approximate Φ(x) using a classic 5th-order rational approximation.
x = np.asarray(x, dtype=float)
# Abramowitz & Stegun style approximation
p = 0.2316419
b1 = 0.319381530
b2 = -0.356563782
b3 = 1.781477937
b4 = -1.821255978
b5 = 1.330274429
ax = np.abs(x)
t = 1.0 / (1.0 + p * ax)
poly = (((((b5 * t + b4) * t + b3) * t + b2) * t + b1) * t)
cdf_pos = 1.0 - normal_pdf(ax) * poly
return np.where(x >= 0.0, cdf_pos, 1.0 - cdf_pos)
def normal_ppf(p: np.ndarray) -> np.ndarray:
# Approximate Φ^{-1}(p) (Acklam's approximation).
p = np.asarray(p, dtype=float)
if np.any((p <= 0.0) | (p >= 1.0)):
raise ValueError("p must be in (0, 1)")
# Coefficients in rational approximations
a = np.array([
-3.969683028665376e01,
2.209460984245205e02,
-2.759285104469687e02,
1.383577518672690e02,
-3.066479806614716e01,
2.506628277459239e00,
])
b = np.array([
-5.447609879822406e01,
1.615858368580409e02,
-1.556989798598866e02,
6.680131188771972e01,
-1.328068155288572e01,
])
c = np.array([
-7.784894002430293e-03,
-3.223964580411365e-01,
-2.400758277161838e00,
-2.549732539343734e00,
4.374664141464968e00,
2.938163982698783e00,
])
d = np.array([
7.784695709041462e-03,
3.224671290700398e-01,
2.445134137142996e00,
3.754408661907416e00,
])
plow = 0.02425
phigh = 1.0 - plow
x = np.empty_like(p)
# Lower region
m = p < plow
q = np.sqrt(-2.0 * np.log(p[m]))
x[m] = -(
(((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
/ ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
)
# Upper region
m = p > phigh
q = np.sqrt(-2.0 * np.log(1.0 - p[m]))
x[m] = (
(((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
/ ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
)
# Central region
m = (~(p < plow)) & (~(p > phigh))
q = p[m] - 0.5
r = q * q
x[m] = (
(((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r + a[5]) * q
/ (((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r + 1.0)
)
return x
def odds_ratio_wald_test(
table_2x2: np.ndarray,
correction: float = 0.0,
alternative: str = "two-sided",
confidence_level: float = 0.95,
):
# Wald test and CI for OR in a 2×2 table.
table_2x2 = np.asarray(table_2x2, dtype=float)
if table_2x2.shape != (2, 2):
raise ValueError("table_2x2 must have shape (2, 2)")
a, b = table_2x2[0] + correction
c, d = table_2x2[1] + correction
log_or = np.log(a) + np.log(d) - np.log(b) - np.log(c)
se = np.sqrt(1.0 / a + 1.0 / b + 1.0 / c + 1.0 / d)
z = log_or / se
if alternative == "two-sided":
p_value = 2.0 * (1.0 - normal_cdf(np.abs(z)))
elif alternative == "greater":
p_value = 1.0 - normal_cdf(z)
elif alternative == "less":
p_value = normal_cdf(z)
else:
raise ValueError("alternative must be: 'two-sided', 'greater', or 'less'")
alpha = 1.0 - confidence_level
zcrit = normal_ppf(1.0 - alpha / 2.0)
ci_log = (log_or - zcrit * se, log_or + zcrit * se)
ci_or = (float(np.exp(ci_log[0])), float(np.exp(ci_log[1])))
return {
'odds_ratio': float(np.exp(log_or)),
'log_odds_ratio': float(log_or),
'se_log_odds_ratio': float(se),
'z': float(z),
'p_value': float(p_value),
'confidence_level': float(confidence_level),
'ci_or': ci_or,
}
res = odds_ratio_wald_test(table, correction=0.0)
res_cc = odds_ratio_wald_test(table, correction=0.5)
print("Wald test (no correction)")
print(res)
print("
Wald test (Haldane–Anscombe correction = 0.5)")
print(res_cc)
Cell In[4], line 150
print("
^
SyntaxError: unterminated string literal (detected at line 150)
# Visualize the z statistic on a standard normal curve
z = res["z"]
zabs = abs(z)
x = np.linspace(-4.0, 4.0, 1200)
y = normal_pdf(x)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="N(0,1) pdf"))
# Shade the two tails beyond ±|z|
left = x <= -zabs
right = x >= zabs
fig.add_trace(go.Scatter(
x=x[left],
y=y[left],
mode="lines",
line=dict(color="rgba(214,39,40,0.0)"),
fill="tozeroy",
fillcolor="rgba(214,39,40,0.25)",
showlegend=False,
))
fig.add_trace(go.Scatter(
x=x[right],
y=y[right],
mode="lines",
line=dict(color="rgba(214,39,40,0.0)"),
fill="tozeroy",
fillcolor="rgba(214,39,40,0.25)",
showlegend=False,
))
fig.add_vline(x=-zabs, line_dash="dash", line_color="firebrick")
fig.add_vline(x= zabs, line_dash="dash", line_color="firebrick")
fig.update_layout(
title=f"Wald test: z = {z:.3f}, two-sided p = {res['p_value']:.4f}",
xaxis_title="z",
yaxis_title="density",
height=420,
)
fig.show()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 3
1 # Visualize the z statistic on a standard normal curve
----> 3 z = res["z"]
4 zabs = abs(z)
6 x = np.linspace(-4.0, 4.0, 1200)
NameError: name 'res' is not defined
A quick simulation check (why z ≈ N(0,1) under H0)#
Under \(H_0\) (no association), if sample sizes are large enough, the Wald statistic behaves like a standard normal.
We’ll simulate many 2×2 tables under independence and look at the distribution of the resulting z-statistics.
def simulate_tables_under_h0(
n_exposed: int,
n_unexposed: int,
p_event: float,
n_sims: int = 30_000,
):
# Simulate 2×2 tables where event is independent of exposure.
a = rng.binomial(n_exposed, p_event, size=n_sims)
c = rng.binomial(n_unexposed, p_event, size=n_sims)
b = n_exposed - a
d = n_unexposed - c
tables = np.stack([np.stack([a, b], axis=1), np.stack([c, d], axis=1)], axis=1)
return tables
def wald_z_for_tables(tables: np.ndarray, correction: float = 0.5) -> np.ndarray:
tables = np.asarray(tables, dtype=float)
a = tables[:, 0, 0] + correction
b = tables[:, 0, 1] + correction
c = tables[:, 1, 0] + correction
d = tables[:, 1, 1] + correction
log_or = np.log(a) + np.log(d) - np.log(b) - np.log(c)
se = np.sqrt(1.0 / a + 1.0 / b + 1.0 / c + 1.0 / d)
return log_or / se
tables_h0 = simulate_tables_under_h0(n_exposed=200, n_unexposed=200, p_event=0.2, n_sims=30_000)
zs = wald_z_for_tables(tables_h0, correction=0.5)
fig = px.histogram(zs, nbins=80, histnorm="probability density", opacity=0.7)
x = np.linspace(-4, 4, 600)
fig.add_trace(go.Scatter(x=x, y=normal_pdf(x), mode="lines", name="N(0,1) pdf"))
fig.update_layout(
title="Simulated z-statistics under H0 (independence)",
xaxis_title="z",
yaxis_title="density",
height=420,
)
fig.show()
print("mean(z):", float(np.mean(zs)))
print("std(z): ", float(np.std(zs)))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 36
33 fig = px.histogram(zs, nbins=80, histnorm="probability density", opacity=0.7)
35 x = np.linspace(-4, 4, 600)
---> 36 fig.add_trace(go.Scatter(x=x, y=normal_pdf(x), mode="lines", name="N(0,1) pdf"))
38 fig.update_layout(
39 title="Simulated z-statistics under H0 (independence)",
40 xaxis_title="z",
41 yaxis_title="density",
42 height=420,
43 )
44 fig.show()
NameError: name 'normal_pdf' is not defined
4) Exact alternative: Fisher’s exact test#
When counts are small (or a cell is 0), the Wald test can be inaccurate.
Fisher’s exact test computes an exact p-value for independence in a 2×2 table conditional on the row and column totals.
Under \(H_0\) and fixed margins, the top-left cell \(a\) follows a hypergeometric distribution. The p-value is computed by summing probabilities of tables at least as “surprising” as the observed one.
We’ll implement a standard two-sided Fisher p-value by summing probabilities of all tables with probability ≤ the observed table’s probability.
LGAMMA = np.vectorize(math.lgamma, otypes=[float])
def log_choose(n: int, k: np.ndarray) -> np.ndarray:
k = np.asarray(k, dtype=float)
# log(n!) = lgamma(n+1) is stable for large n
return LGAMMA(n + 1.0) - LGAMMA(k + 1.0) - LGAMMA(n - k + 1.0)
def fisher_exact(table_2x2: np.ndarray, alternative: str = "two-sided"):
table_2x2 = np.asarray(table_2x2, dtype=int)
if table_2x2.shape != (2, 2):
raise ValueError("table_2x2 must have shape (2, 2)")
a_obs, b_obs = table_2x2[0]
c_obs, d_obs = table_2x2[1]
r1 = a_obs + b_obs
r2 = c_obs + d_obs
c1 = a_obs + c_obs
c2 = b_obs + d_obs
n = r1 + r2
a_min = max(0, c1 - r2)
a_max = min(r1, c1)
a_vals = np.arange(a_min, a_max + 1)
# Hypergeometric probabilities with fixed margins
log_p = log_choose(c1, a_vals) + log_choose(c2, r1 - a_vals) - log_choose(n, r1)
# Stabilize before exponentiating
log_p -= np.max(log_p)
p = np.exp(log_p)
p /= np.sum(p)
# Observed probability
p_obs = float(p[a_vals == a_obs][0])
if alternative == "two-sided":
p_value = float(np.sum(p[p <= p_obs + 1e-15]))
elif alternative == "less":
p_value = float(np.sum(p[a_vals <= a_obs]))
elif alternative == "greater":
p_value = float(np.sum(p[a_vals >= a_obs]))
else:
raise ValueError("alternative must be: 'two-sided', 'greater', or 'less'")
return {
'a_support': a_vals,
'pmf': p,
'p_value': p_value,
'p_observed_table': p_obs,
}
fisher = fisher_exact(table, alternative="two-sided")
print("Fisher two-sided p-value:", fisher["p_value"])
# Visualize the exact null distribution of 'a' given the margins
colors = np.where(fisher["pmf"] <= fisher["p_observed_table"] + 1e-15, "crimson", "steelblue")
fig = go.Figure()
fig.add_trace(go.Bar(
x=fisher["a_support"],
y=fisher["pmf"],
marker_color=colors,
))
fig.add_vline(x=int(table[0, 0]), line_dash="dash", line_color="black")
fig.update_layout(
title=f"Fisher's exact test (two-sided p = {fisher['p_value']:.4f})",
xaxis_title="a (top-left cell count)",
yaxis_title="P(A=a | margins)",
height=420,
)
fig.show()
Fisher two-sided p-value: 0.017149022768408462
5) Interpreting OR in terms of probabilities (why OR can mislead)#
Because OR multiplies odds, the same OR can correspond to different probability changes.
If baseline probability is \(p_0\) and the odds ratio is OR, then:
baseline odds: \( ext{odds}_0 = rac{p_0}{1-p_0}\)
new odds: \( ext{odds}_1 = ext{OR}\cdot ext{odds}_0\)
implied new probability:
We’ll plot \(p_1\) vs \(p_0\) for a few OR values, and also the risk ratio \(p_1/p_0\).
def p1_from_p0_and_or(p0: np.ndarray, odds_ratio: float) -> np.ndarray:
p0 = np.asarray(p0, dtype=float)
return (odds_ratio * p0) / (1.0 - p0 + odds_ratio * p0)
p0 = np.linspace(0.001, 0.999, 500)
ors = [0.5, 1.0, 2.0, 5.0]
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=("Implied p1 for a fixed OR", "Risk ratio p1/p0 for a fixed OR"),
)
for OR in ors:
p1 = p1_from_p0_and_or(p0, OR)
rr = p1 / p0
fig.add_trace(go.Scatter(x=p0, y=p1, mode="lines", name=f"OR={OR}"), row=1, col=1)
fig.add_trace(go.Scatter(x=p0, y=rr, mode="lines", name=f"OR={OR}", showlegend=False), row=1, col=2)
fig.update_xaxes(title_text="baseline probability p0", row=1, col=1)
fig.update_yaxes(title_text="implied probability p1", row=1, col=1)
fig.update_xaxes(title_text="baseline probability p0", row=1, col=2)
fig.update_yaxes(title_text="risk ratio = p1/p0", row=1, col=2)
fig.update_layout(title="Same OR, different probability changes", height=460)
fig.show()
6) How to interpret the test result#
What a p-value answers (in this context):
If there were truly no association between exposure and outcome (OR = 1), how unusual would it be to see a table at least this extreme (by the chosen test)?
Small p-value (e.g., < 0.05): evidence against OR = 1 (an association is plausible)
Large p-value: the data are consistent with OR = 1 (but this is not proof of no effect)
What the confidence interval answers:
A range of OR values compatible with the data (given the test’s assumptions).
If the CI excludes 1, it aligns with rejecting OR=1 at the corresponding level.
Always report the effect size (OR) and uncertainty (CI), not just the p-value.
7) Practical notes + pitfalls#
OR is symmetric: swapping rows or columns in the 2×2 table inverts the OR (interpret carefully).
Case-control studies often estimate OR because sampling is conditional on the outcome.
OR ≠ risk ratio unless the event is rare; for common outcomes OR can look “bigger” than the risk ratio.
Small counts / zeros: prefer Fisher’s exact test (or use continuity corrections / exact logistic regression).
Association ≠ causation: confounding can produce OR ≠ 1 even without a causal effect.
8) Exercises#
Create a 2×2 table with a zero cell and compare OR and Wald p-values with correction=0 and correction=0.5.
For a fixed OR=2, read off the implied \(p_1\) when \(p_0 \in \{0.01, 0.1, 0.5\}\). Why do the probability changes differ so much?
Implement a function that takes raw individual-level data (two boolean arrays) and returns the 2×2 table, OR, and test results.
References#
Agresti, An Introduction to Categorical Data Analysis
Fisher’s exact test and the hypergeometric distribution (standard categorical data texts)
Relationship to logistic regression: for a binary predictor, the logistic regression coefficient equals \(\log( ext{OR})\).